Skip to content

WebSocket Error Handling: Close Codes and Recovery

Handshake failures: errors before the connection exists

Section titled “Handshake failures: errors before the connection exists”

Before your WebSocket connection is established, the HTTP upgrade handshake can fail. The server returns an HTTP error code (403, 502, etc.) but the browser does not expose it to JavaScript. You get an onerror followed by onclose with code 1006 and an empty reason.

Common causes: CORS misconfiguration (the server does not include the right Access-Control headers), proxy not supporting the Upgrade header (older HTTP/1.0 proxies), authentication failure during the upgrade, or the server rejecting the connection due to rate limiting. Debug these with browser DevTools - the Network tab shows the HTTP upgrade request and the server’s response status. See Connection Refused for a full troubleshooting guide.

The browser gives you nothing useful on error

Section titled “The browser gives you nothing useful on error”

The WebSocket onerror event is one of the most misleading APIs in the browser. It fires when something goes wrong, but provides no error code, no message, and no details about what happened:

const ws = new WebSocket("wss://example.com/ws");
ws.onerror = (event) => {
// event.message → undefined
// event.code → undefined
// event.reason → undefined
console.log(event); // Generic Event object. Useless.
};

This is by design. The browser hides error details to prevent scripts from probing internal network infrastructure. A malicious page cannot distinguish “server refused the connection” from “firewall blocked port 443” — and that is a security feature, not a bug.

The real signal is in onclose. When a WebSocket connection fails or is terminated, onclose fires with a numeric close code and a reason string that tell you exactly what happened:

ws.onclose = (event) => {
console.log(event.code); // 1006, 1008, 1011, etc.
console.log(event.reason); // Human-readable explanation
};

Practical rule: use onerror only for logging that an error occurred. Use onclose for all decision-making — whether to retry, what to tell the user, and what to report to your monitoring system.

Not all close codes are equal. Some mean “try again,” others mean “stop trying and fix your code.” Treating them the same — either retrying everything or surfacing every error to the user — is the most common error handling mistake in production WebSocket code.

These indicate temporary problems. Retry with exponential backoff:

CodeNameMeaning
1006Abnormal CloseNetwork dropped, no close frame received
1011Internal ErrorServer hit an unexpected condition
1012Service RestartServer is restarting, come back soon
1013Try Again LaterServer is overloaded, back off

These indicate a problem that retrying will not fix. Surface the error and fix the underlying cause:

CodeNameMeaning
1008Policy ViolationAuth failed, invalid origin, banned
1003Unsupported DataServer cannot handle the data type
1002Protocol ErrorMalformed frame, protocol violation
CodeNameMeaning
1000Normal ClosureClean shutdown, both sides agreed
1001Going AwayServer shutting down or navigating
function classifyClose(code) {
switch (code) {
case 1000:
case 1001:
return "normal";
case 1006:
case 1011:
case 1012:
case 1013:
return "transient";
case 1002:
case 1003:
case 1008:
return "permanent";
default:
// 4000-4999: application-defined codes
return code >= 4000 ? "application" : "transient";
}
}
ws.onclose = (event) => {
const type = classifyClose(event.code);
if (type === "transient") {
scheduleReconnect(); // backoff + retry
} else if (type === "permanent") {
showError(event.code, event.reason);
}
// 'normal' — do nothing
};

For the full list of close codes and their meanings, see the WebSocket Close Codes Reference.

function createConnection(url, handlers) {
const ws = new WebSocket(url);
ws.onerror = () => {
// Log only — no useful information here
handlers.onError?.("connection_error");
};
ws.onclose = (event) => {
const type = classifyClose(event.code);
handlers.onClose?.(event.code, event.reason, type);
};
ws.onmessage = (event) => {
try {
const data = JSON.parse(event.data);
handlers.onMessage?.(data);
} catch (e) {
// Bad message — log it, do NOT close the connection
handlers.onParseError?.(event.data, e);
}
};
return ws;
}

Node.js gives you more error detail than the browser. The error event fires with an actual Error object, and you can distinguish connection-phase errors from message-phase errors:

const WebSocket = require("ws");
const ws = new WebSocket("wss://example.com/ws");
ws.on("error", (err) => {
// Unlike browser, err has useful properties
if (err.code === "ECONNREFUSED") {
// Server is down — retry
} else if (err.code === "ENOTFOUND") {
// DNS failure — permanent, check your URL
}
});
ws.on("close", (code, reason) => {
const type = classifyClose(code);
// Same classification logic as browser
});
import websockets
from websockets.exceptions import ConnectionClosed
async def connect(uri):
try:
async with websockets.connect(uri) as ws:
async for message in ws:
try:
data = json.loads(message)
handle_message(data)
except json.JSONDecodeError:
log.warning("Bad message, skipping")
except ConnectionClosed as e:
if e.code in (1006, 1011, 1012, 1013):
await reconnect_with_backoff()
else:
raise # Permanent error — propagate
_, msg, err := conn.ReadMessage()
if err != nil {
if websocket.IsCloseError(err,
websocket.CloseNormalClosure,
websocket.CloseGoingAway) {
return // Clean shutdown
}
if websocket.IsUnexpectedCloseError(err,
websocket.CloseAbnormalClosure,
websocket.CloseInternalServerErr) {
log.Printf("transient close: %v", err)
scheduleReconnect()
return
}
log.Printf("unexpected error: %v", err)
}

Closing the connection on a message parse error

Section titled “Closing the connection on a message parse error”

A single bad message does not mean the connection is broken. The transport is fine — one message was malformed. Log it and move on:

// Wrong: kills a working connection
ws.onmessage = (event) => {
try {
handle(JSON.parse(event.data));
} catch (e) {
ws.close(1002, "invalid message"); // Don't do this
}
};
// Right: skip the bad message, keep the connection
ws.onmessage = (event) => {
try {
handle(JSON.parse(event.data));
} catch (e) {
logParseError(event.data, e);
}
};

No distinction between transient and permanent errors

Section titled “No distinction between transient and permanent errors”

Retrying an authentication failure (code 1008) with exponential backoff will retry forever and never succeed. The token is invalid or the origin is blocked — retrying the same request changes nothing. Classify the error first, then decide whether to retry.

Always set a ceiling. Either a maximum retry count (10-15 attempts) or a maximum elapsed time (2-5 minutes). After the limit, surface a “connection lost” state to the user and let them retry manually. On mobile, unbounded retries drain battery with zero benefit. See the reconnection guide for backoff implementation.

console.log is not monitoring. In production, you need:

  • Structured logging with close codes, connection duration, and timestamps
  • Metrics tracking error rates and close code distribution
  • Alerting on sustained error rate spikes

You do not need a full observability stack on day one, but you need more than console.log.

When the connection drops, outbound messages have nowhere to go. Without buffering, they are silently lost. A bounded buffer prevents data loss during short disconnections:

class MessageBuffer {
constructor({ maxSize = 100, maxBytes = 1_048_576 } = {}) {
this.queue = [];
this.bytes = 0;
this.maxSize = maxSize;
this.maxBytes = maxBytes;
}
push(message) {
const size = JSON.stringify(message).length;
while (
this.queue.length >= this.maxSize ||
this.bytes + size > this.maxBytes
) {
const dropped = this.queue.shift();
this.bytes -= JSON.stringify(dropped).length;
}
this.queue.push(message);
this.bytes += size;
}
flush(sendFn) {
while (this.queue.length > 0) {
sendFn(this.queue.shift());
}
this.bytes = 0;
}
}

Key decisions:

  • Bound the buffer — 100 messages or 1MB, whichever comes first. An unbounded buffer will consume all available memory on a long disconnection
  • Drop oldest first — newest messages are usually more relevant than stale ones
  • Flush on reconnect — after the new connection is established, drain the buffer before sending new messages

For full reconnection patterns including server-side message replay and sequence tracking, see the reconnection guide.

Close code distribution over time is the single most useful WebSocket metric you can track. It tells you exactly what category of failure is happening:

  • Spike in 1006 — network-level issue. Check load balancer health, proxy timeouts, or a network partition
  • Spike in 1008 — authentication bug. A deploy may have broken token validation or changed CORS policy
  • Spike in 1011 — server-side crash. Check your application logs for unhandled exceptions
  • Spike in 1012 — expected during deploys. If it persists, your deploy is stuck

Track error rate per connection, not just globally. A high global error rate might be one user with a bad network generating thousands of reconnections. Per-connection rates separate infrastructure problems from individual client issues.

Set alerts on sustained error rates above your baseline. A brief spike during a deploy is normal. A sustained elevation over 10-15 minutes means something is broken. You do not need a full observability platform to start — even basic counters by close code in your existing metrics system will catch most WebSocket incidents before users report them.

Why does the WebSocket onerror event not give any details?

Section titled “Why does the WebSocket onerror event not give any details?”

The browser hides error details from JavaScript for security. If onerror exposed network-level information like “connection refused on port 8080” or “TLS handshake failed,” a malicious script could probe internal network infrastructure. The close code in onclose provides the information you need — what type of failure occurred, without leaking how your network is structured. Use onerror for logging and onclose for decision-making.

Which WebSocket close codes should I retry on?

Section titled “Which WebSocket close codes should I retry on?”

Retry on codes that indicate temporary conditions: 1006 (abnormal closure — usually a network drop), 1011 (server internal error), 1012 (server restart), and 1013 (try again later). Do not retry on 1008 (policy violation — typically auth failure), 1003 (unsupported data), or 1002 (protocol error). These indicate a bug in your code or configuration, not a transient problem. See the close codes reference for the full list.

How do I buffer messages during a WebSocket disconnection?

Section titled “How do I buffer messages during a WebSocket disconnection?”

Queue outbound messages in a bounded in-memory buffer during disconnection. Set both a message count limit (100 messages) and a byte size limit (1MB). When the buffer overflows, drop the oldest messages — newer data is usually more relevant. On reconnect, flush the entire buffer before sending new messages. Pair this with idempotency keys on each message so the server can deduplicate if a message was sent but not acknowledged before the connection dropped.

What should I monitor for WebSocket errors in production?

Section titled “What should I monitor for WebSocket errors in production?”

Track the distribution of close codes over time. Each code maps to a failure category: 1006 is network, 1008 is auth, 1011 is server crash. A change in distribution tells you what broke. Also track error rate per connection to distinguish infrastructure-wide problems from one client with a bad network hammering your reconnection endpoint. Alert on sustained rate increases, not individual spikes.

Should I close the WebSocket on a message parse error?

Section titled “Should I close the WebSocket on a message parse error?”

No. A malformed message means one message was bad, not that the connection is broken. Log the error with the raw message payload for debugging, skip that message, and keep processing. Closing the connection forces a full reconnection cycle — DNS lookup, TCP handshake, TLS negotiation, WebSocket upgrade — all because of one bad JSON payload. Reserve connection closure for actual transport or protocol failures.