Distributing Kubernetes Watch Events with Server-Side Sharding in v1.36

By • min read

Overview

As your Kubernetes cluster grows to tens of thousands of nodes, controllers that watch high-cardinality resources like Pods encounter a significant scaling barrier. In a typical setup, every replica of a horizontally scaled controller receives the complete event stream from the API server. This forces each replica to deserialize, process, and then discard the objects it doesn't manage—a costly waste of CPU, memory, and network bandwidth. Scaling out the controller does not reduce this per-replica overhead; it multiplies it.

Distributing Kubernetes Watch Events with Server-Side Sharding in v1.36

Kubernetes v1.36 introduces a new alpha feature (KEP-5866) called server-side sharded list and watch. With this feature enabled, the API server filters events at the source. Each controller replica receives only the slice of the resource collection it owns, drastically reducing resource consumption and allowing clusters to scale further.

This tutorial explains why client-side sharding falls short, how server-side sharding works, and how you can use it in your controllers to achieve better scalability.

Prerequisites

Before diving into the implementation, ensure you have the following:

A Kubernetes cluster running version v1.36 or later (alpha feature gate must be enabled).
Feature Gate: Enable the ShardedListAndWatch feature gate on the API server component. You can do this via the --feature-gates=ShardedListAndWatch=true flag.
Client libraries: Your controller should use a recent version of client-go that supports the ShardSelector field in ListOptions.
Understanding of hash ranges: You need to assign each replica a unique, non-overlapping segment of the hash space (typically using the object's UID).

Step-by-Step Instructions

1. Understanding the Shard Selector

The core of this feature is a new field ShardSelector in ListOptions. You provide a string that defines a hash range using the shardRange() function:

shardRange(<field_path>, <start_hex>, <end_hex>)

The API server computes a deterministic 64-bit FNV-1a hash of the value at the specified field path (currently object.metadata.uid or object.metadata.namespace). Only objects whose hash falls within the half-open interval [start, end) are returned in list responses and watch event streams. Because the hash function is consistent across all API server replicas, this feature works safely with multiple API servers.

2. Configuring an Informer to Use Server-Side Sharding

Controllers typically use informers to list and watch resources. To shard the workload, each replica injects the ShardSelector into the ListOptions used by its informers via WithTweakListOptions. Here's a Go example:

import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/client-go/informers"
)

// Example: Replica A handles the lower half of the UID hash space
shardSelector := "shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')"

factory := informers.NewSharedInformerFactoryWithOptions(client, resyncPeriod,
    informers.WithTweakListOptions(func(opts *metav1.ListOptions) {
        opts.ShardSelector = shardSelector
    }),
)

// Create an informer for Pods (or any resource)
informer := factory.Core().V1().Pods().Informer()

For a 2-replica deployment, you would assign:

Replica 0: '0x0000000000000000' to '0x8000000000000000'
Replica 1: '0x8000000000000000' to '0xFFFFFFFFFFFFFFFF'

Each replica then processes only its assigned slice.

3. Deploying Multiple Replicas with Unique Ranges

When deploying your controller with multiple replicas, ensure each replica is configured with a distinct, non-overlapping hash range that together covers the entire 64-bit space. Typically, you pass the replica index as an environment variable or command-line flag and construct the range accordingly. Example logic (pseudo-code):

totalReplicas := 3
replicaIndex := os.Getenv("REPLICA_INDEX")  // 0, 1, or 2
shardSize := 0xFFFFFFFFFFFFFFFF / totalReplicas
start := fmt.Sprintf("0x%016X", shardSize * replicaIndex)
end := fmt.Sprintf("0x%016X", shardSize * (replicaIndex + 1))
shardSelector := fmt.Sprintf("shardRange(object.metadata.uid, '%s', '%s')", start, end)

Use a StatefulSet or Deployment with a custom entrypoint to set these values.

4. Verifying That Sharding Is Working

After deploying, check that each replica receives only its assigned objects. You can enable API server audit logging to see the shardSelector field in list/watch requests. Alternatively, add debug logging in your controller to count the number of events received per replica. Compare with the expected distribution based on the hash of UIDs.

Common Mistakes

Not enabling the alpha feature gate: The feature is behind the ShardedListAndWatch feature gate. Without it, the API server ignores the ShardSelector field and sends all events.
Using overlapping hash ranges: If two replicas specify overlapping ranges, some objects will be sent to multiple replicas, defeating the purpose. Ensure the intervals are disjoint and together cover the whole space.
Forgetting to update both list and watch: The ShardSelector is applied to the initial list and subsequent watches. If you only set it on list, your watch might still receive all events. The informer handles both automatically via WithTweakListOptions.
Misunderstanding hash function consistency: The hash is based on the object's UID (or namespace) and is stable across API server restarts. However, if you change the field path (e.g., from UID to namespace), the sharding will change. Stick to one consistent field.
Not testing with realistic workloads: Sharding benefits are most visible with high-cardinality resources (e.g., 10k+ Pods). Test with a representative cluster size to measure improvement.

Summary

Server-side sharded list and watch in Kubernetes v1.36 moves event filtering from the client to the API server, enabling each controller replica to receive only the objects it owns. This significantly reduces CPU, memory, and network overhead compared to client-side sharding. By configuring a shard selector based on a deterministic hash of the object UID, you can horizontally scale controllers without multiplying resource consumption. Enable the alpha feature gate, assign non-overlapping hash ranges to replicas, and enjoy a more scalable Kubernetes.