Kubernetes Patterns for High Availability

When designing applications for Kubernetes, high availability is often a critical requirement. In this post, I'll share some battle-tested patterns that we've implemented at Meituan to ensure our services remain available even during node failures, zone outages, or during maintenance operations.

Pod Anti-Affinity

One of the most basic patterns for high availability is ensuring your pods don't all run on the same node. This can be achieved using pod anti-affinity:

```yaml affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution:

labelSelector: matchExpressions:
- key: app operator: In values:
  - my-app topologyKey: "kubernetes.io/hostname" ```

This configuration ensures that pods with the label app=my-app will not be scheduled on the same node.

Topology Spread Constraints

For more advanced distribution of pods across failure domains, you can use topology spread constraints:

```yaml topologySpreadConstraints:

maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: my-app ```

This ensures your pods are evenly distributed across availability zones.

Pod Disruption Budgets

To protect your application during voluntary disruptions (like node drains during maintenance), use Pod Disruption Budgets (PDBs):

```yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: my-app-pdb spec: minAvailable: 2 # or use maxUnavailable selector: matchLabels: app: my-app ```

This ensures that at least 2 pods of your application remain available during voluntary disruptions.

Readiness Probes

Properly configured readiness probes ensure that traffic is only sent to pods that are ready to handle requests:

```yaml readinessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 5 periodSeconds: 10 ```

Circuit Breaking

At Meituan, we've implemented circuit breaking at multiple levels:

Service Mesh Level: Using Istio to implement circuit breaking between services
Application Level: Using libraries like resilience4j to implement circuit breaking in the application code
Database Level: Implementing connection pooling with circuit breaking to protect databases from overload

Conclusion

High availability in Kubernetes requires a multi-layered approach. By combining these patterns, you can build resilient applications that can withstand various types of failures.

In my next post, I'll dive deeper into disaster recovery strategies for stateful applications on Kubernetes.