Kubernetes Ingress WebSocket Configuration
Kubernetes Ingress controllers can handle WebSocket connections with proper configuration, though they require specific settings to accommodate the unique characteristics of WebSocket protocols. Unlike traditional HTTP requests, WebSocket connections are long-lived, stateful, and require protocol upgrade handling. This comprehensive guide covers the most popular ingress controllers: NGINX, Traefik, and HAProxy, plus service mesh integration with Istio and Linkerd. We’ll also explore deployment strategies, monitoring approaches, and troubleshooting techniques to ensure robust WebSocket implementations in production Kubernetes environments.
Quick Start: NGINX Ingress
The most common configuration for WebSocket support in Kubernetes:
apiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: websocket-ingress annotations: nginx.ingress.kubernetes.io/proxy-read-timeout: '3600' nginx.ingress.kubernetes.io/proxy-send-timeout: '3600' nginx.ingress.kubernetes.io/proxy-connect-timeout: '3600'spec: ingressClassName: nginx rules: - host: ws.example.com http: paths: - path: / pathType: Prefix backend: service: name: websocket-service port: number: 8080
NGINX Ingress Controller
NGINX Ingress Controller is the most widely adopted solution for WebSocket routing in Kubernetes environments. It provides excellent performance, extensive configuration options, and robust support for WebSocket protocol upgrades. The controller automatically detects WebSocket upgrade requests and handles the protocol switching seamlessly, making it an ideal choice for production deployments where reliability and performance are critical.
Installation
# Using Helmhelm repo add ingress-nginx https://kubernetes.github.io/ingress-nginxhelm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx \ --namespace ingress-nginx \ --create-namespace \ --set controller.config.proxy-read-timeout="3600" \ --set controller.config.proxy-send-timeout="3600" \ --set controller.config.use-proxy-protocol="false"
# Or using kubectlkubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.8.1/deploy/static/provider/cloud/deploy.yaml
WebSocket Configuration
Complete NGINX Ingress configuration for WebSocket:
apiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: websocket-ingress namespace: default annotations: # WebSocket specific timeouts nginx.ingress.kubernetes.io/proxy-read-timeout: '3600' nginx.ingress.kubernetes.io/proxy-send-timeout: '3600' nginx.ingress.kubernetes.io/proxy-connect-timeout: '3600'
# Buffering settings nginx.ingress.kubernetes.io/proxy-buffering: 'off' nginx.ingress.kubernetes.io/proxy-request-buffering: 'off'
# WebSocket headers (usually auto-detected) nginx.ingress.kubernetes.io/upstream-hash-by: '$remote_addr'
# SSL configuration nginx.ingress.kubernetes.io/ssl-redirect: 'true' nginx.ingress.kubernetes.io/force-ssl-redirect: 'true'
# Backend protocol nginx.ingress.kubernetes.io/backend-protocol: 'HTTP'
# Session affinity for WebSocket nginx.ingress.kubernetes.io/affinity: 'cookie' nginx.ingress.kubernetes.io/affinity-mode: 'persistent' nginx.ingress.kubernetes.io/session-cookie-name: 'ws-server' nginx.ingress.kubernetes.io/session-cookie-expires: '86400' nginx.ingress.kubernetes.io/session-cookie-max-age: '86400' nginx.ingress.kubernetes.io/session-cookie-path: '/'
# Rate limiting nginx.ingress.kubernetes.io/limit-rps: '10' nginx.ingress.kubernetes.io/limit-connections: '100'
# CORS settings nginx.ingress.kubernetes.io/enable-cors: 'true' nginx.ingress.kubernetes.io/cors-allow-origin: '*' nginx.ingress.kubernetes.io/cors-allow-methods: 'GET, PUT, POST, DELETE, PATCH, OPTIONS' nginx.ingress.kubernetes.io/cors-allow-headers: 'DNT,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization' nginx.ingress.kubernetes.io/cors-max-age: '1728000'spec: ingressClassName: nginx tls: - hosts: - ws.example.com secretName: websocket-tls rules: - host: ws.example.com http: paths: - path: /ws pathType: Prefix backend: service: name: websocket-service port: number: 8080
ConfigMap for Global Settings
apiVersion: v1kind: ConfigMapmetadata: name: nginx-configuration namespace: ingress-nginxdata: # Global WebSocket settings proxy-read-timeout: '3600' proxy-send-timeout: '3600' proxy-connect-timeout: '30'
# Buffer settings proxy-buffering: 'off' proxy-buffer-size: '4k' proxy-buffers: '8 4k' proxy-busy-buffers-size: '8k' proxy-max-temp-file-size: '1024m'
# Keepalive settings upstream-keepalive-connections: '320' upstream-keepalive-requests: '10000' upstream-keepalive-timeout: '60'
# Worker settings worker-processes: 'auto' worker-connections: '10240'
# Rate limiting limit-req-status-code: '429' limit-conn-status-code: '429'
# Logging log-format-upstream: '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for" $proxy_upstream_name $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id'
# SSL settings ssl-protocols: 'TLSv1.2 TLSv1.3' ssl-ciphers: 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256' ssl-prefer-server-ciphers: 'true'
# HTTP/2 settings use-http2: 'true' http2-max-field-size: '16k' http2-max-header-size: '32k'
Custom Server Snippets
For advanced configuration:
apiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: websocket-ingress annotations: nginx.ingress.kubernetes.io/server-snippet: | location ~* /ws { proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_http_version 1.1; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $host; proxy_cache_bypass $http_upgrade;
# WebSocket specific proxy_buffering off; proxy_request_buffering off;
# Timeouts proxy_connect_timeout 7d; proxy_send_timeout 7d; proxy_read_timeout 7d; }
Traefik Ingress Controller
Traefik offers a modern, cloud-native approach to ingress management with automatic service discovery and excellent WebSocket support. One of Traefik’s key advantages is its ability to automatically detect WebSocket connections without requiring explicit configuration, making it particularly suitable for dynamic environments where services are frequently added or modified. Traefik’s middleware system provides powerful traffic shaping, circuit breaker, and rate limiting capabilities specifically designed for long-lived WebSocket connections.
Installation
# Using Helmhelm repo add traefik https://helm.traefik.io/traefikhelm repo update
helm install traefik traefik/traefik \ --namespace traefik \ --create-namespace \ --set ports.websocket.port=8080 \ --set ports.websocket.expose=true \ --set ports.websocket.protocol=TCP
Traefik IngressRoute for WebSocket
apiVersion: traefik.containo.us/v1alpha1kind: IngressRoutemetadata: name: websocket-ingressroute namespace: defaultspec: entryPoints: - websecure routes: - match: Host(`ws.example.com`) kind: Rule services: - name: websocket-service port: 8080 # WebSocket is auto-detected by Traefik middlewares: - name: websocket-headers tls: secretName: websocket-tls---apiVersion: traefik.containo.us/v1alpha1kind: Middlewaremetadata: name: websocket-headers namespace: defaultspec: headers: customRequestHeaders: X-Forwarded-Proto: https customResponseHeaders: X-Frame-Options: SAMEORIGIN X-Content-Type-Options: nosniff sslRedirect: true sslForceHost: true---apiVersion: traefik.containo.us/v1alpha1kind: Middlewaremetadata: name: websocket-ratelimit namespace: defaultspec: rateLimit: average: 100 period: 1m burst: 50---apiVersion: traefik.containo.us/v1alpha1kind: Middlewaremetadata: name: websocket-circuit-breaker namespace: defaultspec: circuitBreaker: expression: ResponseCodeRatio(500, 600, 0, 600) > 0.30
Traefik Sticky Sessions
apiVersion: traefik.containo.us/v1alpha1kind: ServersTransportmetadata: name: websocket-transport namespace: defaultspec: serverName: websocket-backend insecureSkipVerify: true maxIdleConnsPerHost: 100 forwardingTimeouts: dialTimeout: 30s responseHeaderTimeout: 3600s idleConnTimeout: 3600s---apiVersion: traefik.containo.us/v1alpha1kind: TraefikServicemetadata: name: websocket-sticky namespace: defaultspec: weighted: sticky: cookie: name: websocket_server secure: true httpOnly: true sameSite: strict services: - name: websocket-service port: 8080 weight: 1
HAProxy Ingress Controller
HAProxy Ingress Controller brings enterprise-grade load balancing capabilities to Kubernetes with exceptional WebSocket support and advanced traffic management features. HAProxy excels in scenarios requiring precise control over connection distribution, sophisticated health checking, and enterprise security requirements. Its mature connection pooling algorithms and configurable timeout settings make it particularly well-suited for applications with varying WebSocket traffic patterns and strict performance requirements.
Installation
# Using Helmhelm repo add haproxytech https://haproxytech.github.io/helm-chartshelm repo update
helm install haproxy-ingress haproxytech/kubernetes-ingress \ --namespace haproxy-ingress \ --create-namespace \ --set controller.config.timeout-client=3600s \ --set controller.config.timeout-server=3600s \ --set controller.config.timeout-connect=30s \ --set controller.config.timeout-tunnel=3600s
HAProxy WebSocket Configuration
apiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: websocket-ingress namespace: default annotations: haproxy.org/timeout-tunnel: '3600s' haproxy.org/load-balance: 'leastconn' haproxy.org/cookie-persistence: 'ws-server' haproxy.org/check: 'true' haproxy.org/check-http: '/health' haproxy.org/forwarded-for: 'true'spec: ingressClassName: haproxy rules: - host: ws.example.com http: paths: - path: /ws pathType: Prefix backend: service: name: websocket-service port: number: 8080---apiVersion: v1kind: ConfigMapmetadata: name: haproxy-configmap namespace: haproxy-ingressdata: timeout-connect: '30s' timeout-client: '3600s' timeout-server: '3600s' timeout-tunnel: '3600s' timeout-http-request: '30s' timeout-http-keep-alive: '60s' timeout-queue: '30s'
# WebSocket detection option-http-server-close: 'false' option-forwardfor: 'true'
# Load balancing balance-algorithm: 'leastconn'
# Health checks check-interval: '10s' check-timeout: '5s' check-rise: '2' check-fall: '3'
# Session persistence cookie: 'SERVERID insert indirect nocache'
# Rate limiting rate-limit-sessions: '100' rate-limit-period: '10s'
HAProxy Advanced Configuration
apiVersion: v1kind: ConfigMapmetadata: name: haproxy-backend-config namespace: defaultdata: websocket-backend: | # Backend configuration for WebSocket backend websocket_backend mode http balance leastconn
# WebSocket support option http-server-close option forwardfor
# Timeouts for long-lived connections timeout server 3600s timeout tunnel 3600s timeout connect 30s
# Health checks option httpchk GET /health HTTP/1.1\r\nHost:\ websocket http-check expect status 200
# Sticky sessions cookie SERVERID insert indirect nocache
# Servers with WebSocket support server ws1 websocket-pod-1:8080 check cookie ws1 server ws2 websocket-pod-2:8080 check cookie ws2 server ws3 websocket-pod-3:8080 check cookie ws3
# Connection limits maxconn 10000
# Queue settings timeout queue 30s option redispatch retries 3
Service Mesh Integration
Service mesh technologies like Istio and Linkerd provide sophisticated traffic management, security, and observability features that complement WebSocket deployments in Kubernetes. These platforms offer advanced capabilities including mutual TLS encryption, traffic splitting for A/B testing, circuit breaking, and comprehensive metrics collection. When properly configured, service meshes can significantly enhance the reliability and security of WebSocket applications while providing detailed visibility into connection patterns and performance characteristics.
Istio Configuration
apiVersion: networking.istio.io/v1beta1kind: Gatewaymetadata: name: websocket-gateway namespace: defaultspec: selector: istio: ingressgateway servers: - port: number: 443 name: https protocol: HTTPS tls: mode: SIMPLE credentialName: websocket-tls hosts: - ws.example.com---apiVersion: networking.istio.io/v1beta1kind: VirtualServicemetadata: name: websocket-vs namespace: defaultspec: hosts: - ws.example.com gateways: - websocket-gateway http: - match: - uri: prefix: /ws route: - destination: host: websocket-service port: number: 8080 timeout: 0s # Disable timeout for WebSocket websocketUpgrade: true # Enable WebSocket upgrade---apiVersion: networking.istio.io/v1beta1kind: DestinationRulemetadata: name: websocket-dr namespace: defaultspec: host: websocket-service trafficPolicy: connectionPool: tcp: maxConnections: 100 http: http1MaxPendingRequests: 100 http2MaxRequests: 100 maxRequestsPerConnection: 2 h2UpgradePolicy: UPGRADE # Automatically upgrade to HTTP/2 loadBalancer: simple: ROUND_ROBIN consistentHash: httpCookie: name: 'session-affinity' ttl: 3600s outlierDetection: consecutiveErrors: 5 interval: 30s baseEjectionTime: 30s maxEjectionPercent: 50 minHealthPercent: 50---apiVersion: v1kind: Servicemetadata: name: websocket-service namespace: default labels: app: websocketspec: ports: - port: 8080 targetPort: 8080 protocol: TCP name: http-websocket # Important: name must include 'http' for Istio selector: app: websocket sessionAffinity: ClientIP sessionAffinityConfig: clientIP: timeoutSeconds: 3600
Linkerd Configuration
apiVersion: policy.linkerd.io/v1beta1kind: ServerAuthorizationmetadata: name: websocket-authz namespace: defaultspec: server: selector: matchLabels: app: websocket client: meshTLS: identities: - 'cluster.local/ns/default/sa/websocket-client'---apiVersion: policy.linkerd.io/v1beta1kind: Servermetadata: name: websocket-server namespace: defaultspec: podSelector: matchLabels: app: websocket port: 8080 proxyProtocol: 'HTTP/1.1' # WebSocket requires HTTP/1.1---apiVersion: v1kind: Servicemetadata: name: websocket-service namespace: default annotations: linkerd.io/inject: enabled config.linkerd.io/proxy-cpu-request: '100m' config.linkerd.io/proxy-memory-request: '20Mi' config.linkerd.io/proxy-cpu-limit: '1' config.linkerd.io/proxy-memory-limit: '250Mi'spec: ports: - port: 8080 targetPort: 8080 protocol: TCP selector: app: websocket
WebSocket Service and Deployment
Deploying WebSocket applications in Kubernetes requires careful consideration of pod distribution, resource allocation, and connection handling strategies. Unlike stateless HTTP services, WebSocket applications maintain persistent connections that can span hours or days, requiring specialized deployment configurations to ensure high availability and graceful scaling. The following deployment patterns optimize for connection stability while maintaining the flexibility to handle varying traffic loads and service updates.
Complete WebSocket Application Deployment
apiVersion: apps/v1kind: Deploymentmetadata: name: websocket-app namespace: default labels: app: websocketspec: replicas: 3 selector: matchLabels: app: websocket template: metadata: labels: app: websocket annotations: prometheus.io/scrape: 'true' prometheus.io/port: '9090' prometheus.io/path: '/metrics' spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - websocket topologyKey: kubernetes.io/hostname containers: - name: websocket-server image: websocket-app:latest imagePullPolicy: Always ports: - containerPort: 8080 name: websocket protocol: TCP - containerPort: 9090 name: metrics protocol: TCP env: - name: PORT value: '8080' - name: MAX_CONNECTIONS value: '10000' - name: PING_INTERVAL value: '30000' resources: requests: memory: '256Mi' cpu: '250m' limits: memory: '512Mi' cpu: '1000m' readinessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 10 periodSeconds: 5 timeoutSeconds: 3 successThreshold: 1 failureThreshold: 3 livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 3 lifecycle: preStop: exec: command: ['/bin/sh', '-c', 'sleep 15']---apiVersion: v1kind: Servicemetadata: name: websocket-service namespace: default labels: app: websocketspec: type: ClusterIP ports: - port: 8080 targetPort: 8080 protocol: TCP name: websocket - port: 9090 targetPort: 9090 protocol: TCP name: metrics selector: app: websocket sessionAffinity: ClientIP sessionAffinityConfig: clientIP: timeoutSeconds: 10800 # 3 hours
Horizontal Pod Autoscaling
Scaling WebSocket applications presents unique challenges compared to traditional stateless services. Since WebSocket connections are bound to specific pods, scaling decisions must account for connection distribution and avoid disrupting active sessions. Effective autoscaling strategies balance resource utilization with connection stability, using custom metrics that reflect the actual load characteristics of WebSocket traffic rather than relying solely on CPU and memory metrics.
HPA Configuration for WebSocket Applications
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: websocket-hpa namespace: defaultspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: websocket-app minReplicas: 3 maxReplicas: 50 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 - type: Pods pods: metric: name: websocket_connections target: type: AverageValue averageValue: '1000' # Scale when avg connections > 1000 behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 10 periodSeconds: 60 - type: Pods value: 2 periodSeconds: 60 selectPolicy: Min scaleUp: stabilizationWindowSeconds: 60 policies: - type: Percent value: 50 periodSeconds: 60 - type: Pods value: 5 periodSeconds: 60 selectPolicy: Max
Monitoring and Observability
Comprehensive monitoring of WebSocket applications in Kubernetes requires specialized metrics and alerting strategies tailored to connection-based workloads. Traditional monitoring approaches focused on request-response patterns don’t adequately capture the behavior of long-lived WebSocket connections. Effective observability solutions track connection lifecycle events, message throughput patterns, error rates, and resource utilization trends specific to persistent connection workloads.
Prometheus ServiceMonitor
apiVersion: monitoring.coreos.com/v1kind: ServiceMonitormetadata: name: websocket-metrics namespace: defaultspec: selector: matchLabels: app: websocket endpoints: - port: metrics interval: 30s path: /metrics---apiVersion: v1kind: ConfigMapmetadata: name: websocket-dashboard namespace: monitoringdata: dashboard.json: | { "dashboard": { "title": "WebSocket Metrics", "panels": [ { "title": "Active Connections", "targets": [ { "expr": "sum(websocket_connections_active)" } ] }, { "title": "Connection Rate", "targets": [ { "expr": "rate(websocket_connections_total[5m])" } ] }, { "title": "Message Rate", "targets": [ { "expr": "rate(websocket_messages_total[5m])" } ] }, { "title": "Error Rate", "targets": [ { "expr": "rate(websocket_errors_total[5m])" } ] } ] } }
Custom Metrics for HPA
apiVersion: v1kind: ConfigMapmetadata: name: adapter-config namespace: custom-metricsdata: config.yaml: | rules: - seriesQuery: 'websocket_connections_active{namespace!="",pod!=""}' resources: overrides: namespace: {resource: "namespace"} pod: {resource: "pod"} name: matches: "^websocket_connections_active" as: "websocket_connections" metricsQuery: 'avg_over_time(websocket_connections_active{<<.LabelMatchers>>}[1m])'
Network Policies
Network security for WebSocket applications requires careful policy design to balance security with operational requirements. Unlike HTTP applications that typically handle short-lived requests, WebSocket applications maintain persistent connections that traverse network boundaries for extended periods. Effective network policies must account for these long-lived connections while restricting unnecessary traffic and preventing lateral movement in case of security breaches.
WebSocket Network Policy Configuration
apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: websocket-network-policy namespace: defaultspec: podSelector: matchLabels: app: websocket policyTypes: - Ingress - Egress ingress: - from: - namespaceSelector: matchLabels: name: ingress-nginx - namespaceSelector: matchLabels: name: istio-system ports: - protocol: TCP port: 8080 - from: - namespaceSelector: matchLabels: name: monitoring ports: - protocol: TCP port: 9090 egress: - to: - namespaceSelector: {} ports: - protocol: TCP port: 53 # DNS - protocol: UDP port: 53 # DNS - to: - podSelector: matchLabels: app: redis # If using Redis for pub/sub ports: - protocol: TCP port: 6379
TLS/SSL Configuration
Securing WebSocket connections with TLS encryption is essential for production deployments, particularly when handling sensitive data or operating in regulated environments. Certificate management in Kubernetes environments requires automation to handle certificate renewals and distribution across multiple ingress points. The cert-manager project provides robust certificate lifecycle management with support for various certificate authorities and automated renewal processes.
Certificate Management with cert-manager
apiVersion: cert-manager.io/v1kind: Certificatemetadata: name: websocket-tls namespace: defaultspec: secretName: websocket-tls issuerRef: name: letsencrypt-prod kind: ClusterIssuer commonName: ws.example.com dnsNames: - ws.example.com - '*.ws.example.com'---apiVersion: cert-manager.io/v1kind: ClusterIssuermetadata: name: letsencrypt-prodspec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: admin@example.com privateKeySecretRef: name: letsencrypt-prod solvers: - http01: ingress: class: nginx
Testing WebSocket Connections
Comprehensive testing of WebSocket deployments in Kubernetes requires both functional and performance validation to ensure applications can handle expected loads while maintaining connection stability. Testing strategies should cover connection establishment, message throughput, failover scenarios, and scaling behavior under various load conditions. Automated testing frameworks help validate deployment configurations and detect regressions before they impact production environments.
Test Pod for WebSocket
apiVersion: v1kind: Podmetadata: name: websocket-test namespace: defaultspec: containers: - name: wscat image: node:alpine command: ["/bin/sh"] args: ["-c", "npm install -g wscat && sleep infinity"]---# Test from within clusterkubectl exec -it websocket-test -- wscat -c ws://websocket-service:8080/ws
# Test through ingresswscat -c wss://ws.example.com/ws
Load Testing with K6
import { check } from 'k6';import ws from 'k6/ws';
export let options = { stages: [ { duration: '30s', target: 100 }, // Ramp up { duration: '1m', target: 100 }, // Stay at 100 connections { duration: '30s', target: 0 }, // Ramp down ],};
export default function () { const url = 'wss://ws.example.com/ws'; const params = { tags: { my_tag: 'websocket' } };
const res = ws.connect(url, params, function (socket) { socket.on('open', () => { console.log('Connected'); socket.send('Hello Server!'); });
socket.on('message', (data) => { console.log('Message received: ', data); });
socket.on('close', () => { console.log('Disconnected'); });
socket.on('error', (e) => { console.log('Error: ', e.error()); });
socket.setTimeout(() => { socket.close(); }, 10000); });
check(res, { 'Connected successfully': (r) => r && r.status === 101 });}
Troubleshooting
Diagnosing WebSocket issues in Kubernetes environments requires understanding both the application-level WebSocket protocol behavior and the underlying Kubernetes networking stack. Common problems often stem from misconfigurations in timeout settings, session affinity, or ingress controller annotations. Systematic troubleshooting approaches help isolate whether issues originate from the application code, Kubernetes configuration, or network infrastructure.
Common Issues and Solutions
- Connection immediately closes
# Check ingress logskubectl logs -n ingress-nginx deployment/nginx-ingress-controller
# Verify annotationskubectl describe ingress websocket-ingress
# Test without TLSkubectl port-forward service/websocket-service 8080:8080wscat -c ws://localhost:8080/ws
- 502 Bad Gateway
# Check service endpointskubectl get endpoints websocket-service
# Verify pods are runningkubectl get pods -l app=websocket
# Check pod logskubectl logs -l app=websocket --tail=50
- Session affinity not working
# Verify session affinity configurationkubectl get service websocket-service -o yaml | grep -A 5 sessionAffinity
# Check ingress cookie settingskubectl get ingress websocket-ingress -o yaml | grep -i cookie
- High latency or timeouts
# Check resource usagekubectl top pods -l app=websocket
# Review HPA statuskubectl get hpa websocket-hpa
# Check network policieskubectl get networkpolicy -o wide
Debug Commands
# Enable debug logging for NGINX Ingresskubectl -n ingress-nginx edit configmap nginx-configuration# Add: error-log-level: debug
# Capture traffic with tcpdumpkubectl exec -it websocket-pod -- tcpdump -i any -w /tmp/capture.pcap port 8080
# Test DNS resolutionkubectl run -it --rm debug --image=busybox --restart=Never -- nslookup websocket-service
# Check ingress controller versionkubectl -n ingress-nginx get deployment nginx-ingress-controller -o jsonpath='{.spec.template.spec.containers[0].image}'
Best Practices
- Use appropriate ingress controller: NGINX for simplicity and performance, Traefik for automatic service discovery and dynamic configuration, or HAProxy for enterprise-grade load balancing requirements
- Configure session affinity: Essential for stateful WebSocket connections to ensure clients reconnect to the same backend pods and maintain application state consistency
- Set proper timeouts: Configure extended timeout values appropriate for WebSocket connections, which can remain active for hours or days depending on application requirements
- Implement health checks: Ensure pods are ready and healthy before receiving traffic, with checks that validate WebSocket endpoint availability rather than just basic HTTP responses
- Use HPA carefully: WebSocket connections are stateful and bound to specific pods, so scale gradually and consider connection distribution when scaling policies trigger
- Monitor connection metrics: Track active connections, connection rates, message throughput, and resource usage patterns specific to WebSocket workloads for informed scaling decisions
- Implement graceful shutdown: Allow adequate time for existing connections to close cleanly during pod termination to prevent data loss and client reconnection storms
- Use network policies: Restrict traffic to necessary ports and sources while allowing for the long-lived nature of WebSocket connections across network boundaries
- Enable TLS/SSL: Always use WSS (WebSocket Secure) in production environments to protect data in transit and maintain client trust and regulatory compliance
- Test failover scenarios: Regularly validate behavior during pod restarts, network partitions, and ingress controller updates to ensure application resilience and recovery capabilities
Additional Resources
- NGINX Ingress Controller Documentation
- Traefik Documentation
- HAProxy Ingress Documentation
- Istio WebSocket Support
- Kubernetes Networking
This guide is maintained by Matthew O’Riordan, Co-founder & CEO of Ably, the real-time data platform. For corrections or suggestions, please open an issue.